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I. INTRODUCTION 



Smeared link actions, actions where the traditional thin link gauge connection is replaced 
by some sort of smeared or fattened link, have gained popularity in recent years. Smearing 
removes the most ultra-violet gauge field fluctuations improving chiral and flavor symmetry 
in Wilson/clover and staggered fermion actions [||, |], ^, ||, ^. These actions also have 
better scaling and topological properties. While smearing is straightforward to implement 
on quenched conflgurations, the dynamical simulation of smeared actions is far from trivial. 
If the smearing is linear in the original gauge variables, numerical updates based on the 
molecular dynamics evolution is complicated but possible. On the other hand, if the smeared 
links are projected back to the gauge group and therefore are not linear in the thin link 
variables, algorithms that require the calculation of the fermionic force do not work. 

In a recent paper P| we suggested that smeared link actions can be effectively simulated 
with partial-global heatbath or over-relaxation updates combined with a Metropolis type 
accept-reject step that requires only a stochastic estimate of the fermionic determinant. In a 
subsequent publication [|] we studied the stochastic evaluation of the fermionic determinant. 
The most important conclusion of Ref. is that the standard deviation of the stochastic 
estimator can be very large, even divergent, thus introducing unacceptably large autocorre- 
lation times in simulations. We suggested two ways to improve the stochastic estimator. In 
this paper we apply those improvements in two and four flavor dynamical staggered action 
simulations and study the effectiveness of the partial-global stochastic Metropolis (PGSM) 
algorithm. Since the standard deviation of the stochastic estimator depends on what fraction 
of the original configuration is changed in one partial-global update, we investigate the auto- 
correlation time as the function of the number of links touched in the partial-global heatbath 
step. This knowledge is necessary to optimize the parameters of the PGSM algorithm and 
compare its effectiveness to other simulation methods. In our numerical simulation we use 
hypercubic smeared (HYP) links with staggered fermions, though many of the concepts 
and improvements of the PGSM algorithm can be directly applied to other kind of smeared 
links and fermionic formulations. 

To make this paper self-contained, in Sect. 2. we briefly summarize our smeared link 
action, the PGSM method and its improvements. In Sect. 3. we discuss the details of the 
updating and the parameters of the simulations. Sect. 4. contains our autocorrelation time 
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results. A short conclusion completes this paper. 

II. THE SMEARED LINK ACTION AND THE PARTIAL-GLOBAL UPDATE 

A. The action 

We consider a smeared link action of the form 

S = S,{U) + S,{V) + Sf{V), (1) 

where Sg{U) and Sg{V) are gauge actions depending on the thin links {U} and smeared links 
{V}, respectively, and 5*/ is the fermionic action depending on the smeared links only. We 
assume that the smeared links are constructed deterministically. In our numerical simulation 
we use hypercubic (HYP) blocking. The HYP links are optimized non-perturbatively to be 
maximally smooth, but they are also tree-level perturbative improved. The construction 
and properties of HYP smearing are discussed in detail in Ref. 
We use a plaquette gauge action for Sg{U) 

5,(f/) = -f EReTr(f/,), (2) 

and we will choose Sg{V) in Sect. II. D to improve computational efficiency. The fermionic 
action describing n/ degenerate flavors with staggered fermions is 

Sf{V) = -^\nDetn{V) (3) 

with 

Q{V) = (Mt(V^)M(\/)),.en,even (4) 

defined on even sites only. M{V) in Eq. || is the standard staggered fermion matrix. 

B. The partial-global stochastic Metropolis update 

In Refs. [|, H, 01 a partial-global stochastic updating (PGSM) algorithm was developed 
to simulate smeared link actions. First a subset of the thin links {U} are updated and a 
new thin gauge link configuration {U'} is proposed. The transition probability p{U, U') of 
the update satisfies detailed balance with Sg{U) 

p{U, t/')e-^«(^) = p{U', t/)e-^«(^'). (5) 
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The proposed configuration is accepted with the probability 

Pace = min{l,e-s-(v')+^.(V)e-r[f^-(v')f^(V)-i]?|^ (g) 

where the stochastic vector ^ is generated with Gaussian distribution, exp(— ^*^). The 
PGSM algorithm satisfies detailed balance |Q. A somewhat similar updating method has 
been proposed and investigated in Refs. |10|. The Kentucky Noisy Monte Carlo method 
proposes gauge fields created with a pure gauge global update but the accept reject step is 
based on the noisy evaluation of the fermionic determinant. So far that method has been 
tested with Wilson fermions at rather heavy quark masses. 



C. Improving the partial-global stochastic Metropolis update 

The PGSM algorithm averages the stochastic estimator of the fermionic determinant 
together with the gauge configuration ensemble and it can be efficient only if the standard 
deviation of the stochastic estimator is small. The standard deviation of the stochastic 
estimator as described by Eq. |^ is governed by the inverse determinant of the matrix 
{2Q^^{V')Q{V) — 1) and diverges if even one of the eigenvalues of Q^^{V')Q{V) is less than 
1/2 One can control the standard deviation by reducing the spread of the eigenvalues 
of the fermionic matrix Q{V). In Ref. we considered the combination of two separate 
methods to achieve that. Here we only briefiy summarize them. 



1. Reduction: The fermionic matrix reduction 0, |TT| removes the most UV part of the 
fermionic operator by defining a reduced matrix 

Qr = f^e-2^(^) (7) 

and rewriting the fermionic action as 

Sf{V) = -^(in Det nr{V) + 2Tif{n)). (8) 

If the function / is a polynomial of fl, the trace of f{fl) can be evaluated exactly 
and only the determinant of the reduced matrix fir has to be calculated stochastically. 
The parameters of / are chosen such as to minimize the eigenvalue distribution of the 
reduced fermionic matrix fir- In Ref. [§] we optimized f{^) for staggered fermions 
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assuming a simple eigenvalue distribution derived in the free field limit. The optimiza- 
tion procedure is easy to implement for any other fermionic action but the optimized 
coefficients will be different. For nearest neighbor staggered fermions with the choice 

f{n) = -0.34017 + 0.35645fi - 0.030379^]2 + 0.000957^^ (9) 

the conditioning number of the fermionic matrix is reduced by about a factor of 30. 
The smallest possible eigenvalue of the operator Q^^(y')Q{V) is now about 8(am)^. 
This is a significant improvement but it is not sufficient to guarantee a small or even 
finite standard deviation of the stochastic estimator. That can be achieved by an 
additional improvement step, the determinant breakup. 

2. Determinant breakup: Writing the fermionic action in the form 

Sf{V) = -^(n^ln Det fi^^H^) + 2Tr/(^])) (10) 

suggests that the stochastic part of the estimator can be evaluated using UbUf/A inde- 
pendent Gaussian vectors as 

Pace = mm{l,e-'^^^+^^^e-^'^=^^'^'^*^^^'^'^''^^''>^'^^^ (11) 

Here ASg = Sg{V') - Sg{V), Af = Tr/(fi') - Tr/(fi), and the argument of the 
stochastic estimator is written in an explicitly Hermitian form. The standard deviation 
of the stochastic estimator is greatly reduced with the determinant breakup, it is now 
finite as long as the matrix Q^^ {V')Qr(y) has no eigenvalue smaller than 1/2"''. With 
non-zero quark mass rii, can always be chosen such that this condition is satisfied. 
Approximating the lowest eigenvalue of Q~^(y')Qr(y) as 8(am)^, Ub = 4: and 8 is 
barely sufficient with am = 0.1 and 0.04. A safer choice is to use = 8 and 12, 
respectively. 

The determinant breakup reduces the statistical fiuctuations of the stochastic estimator 
by taking the sum of n^nf/A smaller terms instead of n//4 original terms in the exponent. 
While taking the average of several stochastic estimates in the acceptance step of the original 
PGSM algorithm (Eq. ||) violates the detailed balance condition, the determinant breakup 
procedure is still exact. 

An added bonus of the determinant breakup is that now simulating arbitrary number 
of degenerate or non-degenerate fiavors is straightforward as long as Uh is chosen to be a 
multiple of 4. 
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D. The smeared link action with reduced fermion matrix 



The reduction of the fermionic matrix is compensated by an effective smeared hnk gauge 
action — ^Tr/(n). With the choice of / a cubic polynomial and the standard nearest 
neighbor staggered action Tr / is the sum of 4- and 6-link gauge loops and can be evaluated 
exactly @. However, even this calculation can be avoided if we choose SgiV) so it cancels 
Tr/(n). With the choice 

S,{V) = ^(2Tr/(fi) -lY^ReTviV,)) (12) 



p 



the dynamical action simplifies to 

^ = - f E Re Tr(t/,) - 1^ E Re TviV,) - ^ In Det n^iV) (13) 

with P and 7 two tunable gauge action parameters. In the PGSM algorithm the new 
gauge configuration is proposed with the thin link pure gauge action and is accepted with 
probability 



Pace = min{l,e-^s}, (14) 

12 



= -^E(ReTr(\/;) - ReTriVp)) (15) 



p 

i=l 



Eqs. |T3| and |TJ give the final form of our action and the stochastic estimator. One should 
note that the action defined in Eq. [1^ depends on the parameters of the reduction function 
/. The dependence is weak, and since the coefficients in / are fixed, they become negligible 
in the continuum limit. 

When expressed in terms of gauge loops, Tr /(f2) contains 4- and 6-link loops. On Nt = 4 
and 6 lattices that includes Polyakov lines, loops wrapping around the temperature direction 
of the lattice. In Refs. ||5|, ||, 0] we suggested to remove these terms by explicitly including 
them in Sg{V). This removal is by no means necessary but might infiuence finite volume 
effects. 
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/3 


7 


am 


a[fm] 


< Trf/p > 


5.65 


-0.10 


0.10 


0.165(1) 


1.6212(4) 


5.65 


-0.14 


0.06 


0.161(1) 


1.6216(5) 


5.65 


-0.15 


0.04 


0.160(1) 


1.6216(4) 


5.65 





oo 


~0.19 


1.6128(3) 



Table I: The parameters and lattice spacing of the nj = 2 runs. The last column shows the matching 
of the plaquette expectation values between the dynamical and pure gauge {am = oo) simulations. 
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7 


am 


a[fm] 


<TtUp> 


5.65 


-0.10 


0.10 


0.173(2) 


1.6033(4) 


5.65 


-0.15 


0.04 


0.165(3) 


1.5913(4) 


5.65 





oo 


~0.19 


1.6128(3) 



Table II: Same as table | but for the nj = A flavor runs. 
III. NUMERICAL SIMULATIONS WITH THE HYP ACTION 
A. Tuning the simulation parameters 

Our final gauge action has three free parameters: in addition to the quark mass, there 
are two gauge couplings, /3 and 7, that can be tuned independently. Since the term of the 
action proportional to 7 is included in the acceptance step, we want to keep it small. Also, a 
finite 7 coefficient in the continuum /3 — > cxd limit can be neglected and perturbative results 
that were obtained with Sg{V) = remain valid. On the other hand a negative 7 coupling 
allows the increase of the coupling /3 in Sg{U), compensating for the renormalization of the 
gauge coupling due to the fermions. Ideally one would like to choose (3 such that the gauge 
action proposed with Sg{U) matches the configurations of the dynamical action either at 
long distance (lattice spacing), short distance (plaquette), or, ideally both. We found that 
choosing the coefficient of the thin link pure gauge action somewhat smaller than what would 
match the desired lattice spacing of the dynamical system matches the plaquette expectation 
values and provides a good parameter set for the dynamical system. 

Table |, where we collected the parameter values of our Uf = 2 fiavor runs, illustrates 
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this point. All simulations were done on 8^ x 24 lattices and we tried to tune the lattice 



spacing, measured by the Sommer parameter tq |[T^, to be a = 0.16 — 0.17 fm. We chose the 
coupling of Sg{U) to he P = 5.65 and kept it fixed, while tuned the coefficient 7 with the 
quark mass. As the last column of table | shows, the plaquette expectation value is almost 
the same on the dynamical and pure gauge configurations. The matching works similarly in 
case oi Uf = 4 fiavors as table |1| illustrates. The pion to rho mass ratios for both the two 
and four fiavor runs vary between 0.55 and 0.70, consistent with a renormalization factor 

Zm ~ 1- 

The matching of the plaquette expectation value with the parameters of tables |JT| are 
almost perfect, but the lattice spacing between the dynamical and pure gauge runs differ by 
10-15%. Our attempt to increase the thin link gauge coupling to /? = 5.7 and decrease 7 to 
achieve the same lattice spacing failed. As we decreased 7 the simulations showed sudden, 
large fiuctuations, signaling perhaps a phase transition. We have not yet explored the full 
phase diagram, rather we decided to use a somewhat mismatched coupling. 

Since the term 2Tr/(f2) in Sg{V) compensates for some of the gauge coupling renormal- 
ization, the coefficient 7 is fairly small at every quark mass we considered. At different (3 
couplings and mass values one could choose different 7 couplings. However as the role of the 
7 term in the action is to compensate for the renormalization of the pure gauge coupling 
due to the dynamical quark mass, it is natural to fix 7 as the function of the quark mass, 
independent of /3. As the shift of the gauge coupling between pure gauge and dynamical 
actions remains finite at am = 0, one can choose 7 finite and small even in the chiral limit. 



B. The polynomial approximation 

To assure the finiteness of the standard deviation of the stochastic estimator it is imper- 
ative to break up the determinant as in Eq. [1^. With quark masses am = 0.1 — 0.04 we 
need rif, = 8 — 12 determinant break up to keep the standard deviation small. That im- 
plies that in the stochastic estimator we need the nl^ root of the fermionic matrix, the 2nl^ 
root of its inverse. To calculate the n*^ root of the matrices involved we use a polynomial 
approximation. This approach was originally developed to approximate the inverse of the 
fermion matrix ||T3| , 0] and has been used extensively with the polynomial Hybrid Monte 



Carlo method. In our case it is most efficient to approximate the appropriate power of the 
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reduced fermionic matrix Qr 



(16) 



where P^" and Qi are k and / order polynomials of the fermionic matrix fl. To accomplish 
this we determine polynomials P^^*"^ and Q;"*^ that approximate 



on the entire spectrum of fl. In general, to determine a polynomial T^. that approximates a 
function g (x) on the interval [Aq, Ai] we minimize the "distance": 



with respect to tk, the coefficients of the polynomial T^. The weight function p (x) is chosen 
in accordance to the problem's requirements. In our case it will be, ideally, the spectral 
density of the operator fl. The advantage of this procedure is that the minimization turns 
into a linear problem. 

The first step is to determine the spectral bounds of Q. The lower bound is Aq = 4m^, 
but the upper bound determination is slightly more complicated. The only firm bound for 
p ^ that we are aware of is 64. However, in our simulations we have seen that the highest 
eigenvalue of on thermalized configurations is around ~ 20 when f2 is defined in terms of 
thin links, and ~ 16 when fl is defined in terms of the HYP links. We have chosen for our 
upper bound Ai = 16.4 to be on the safe side. As a rule of the thumb, we observed that 
the highest eigenvalue for a particular operator p^, when defined on the HYP links, is just 
slightly higher than the one the operator assumes in the free field case. Also, it is worth 
mentioning that if we choose too low a boundary for our polynomial approximation (lower 
than the highest eigenvalue of fl) the algorithm starts behaving erratically. 

Once the bounds are set we focus on the weight function. The best choice for the weight 
function is the spectral density of fl. However, this density is difficult to determine in 
the general case. A more reasonable option is to use the spectral density in the free field 
case. Using different weight functions we observed that they do not alter the polynomial 
coefficients dramatically and, thus, it is better to choose a weight function that is more 




(17) 




(18) 
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Figure 1: The accuracy level for the P polynomials with different polynomial orders and breakup 
levels: open squares rii, = 1, filled squares rii, = 4, crosses = 8 {am = 0.10). 

convenient rather than accurate. The functions that we used have the form p{x) = , 
where u was chosen to cancel the divergent behavior around Aq. This weight function has 
the advantage that we can compute some of the integrals resulting from minimizing the 
distance function in Eq. |18| analytically. This is important since these integrals have to be 
computed very precisely for the method to work (the integrals have to be computed with 
hundreds of digits of precision). 

In order to get a measure of the accuracy of the polynomial approximation we used the 
distance function 5 = defined in Eq. |l^ with the weight function set to p{x) = 1. 
We have found that it is sufficient to get polynomials with 6 ~ 10^^; beyond that the 
roundoff errors become dominant. In figures |l| and | we plot the accuracy of the 



and 



"6 



" polynomials for = 1, 4 and 8 as the function of the polynomial order for 
am = 0.1. The rih = I case is of no practical interest, we show it only to provide a reference. 
The first thing we notice is that the level of the approximation for the final functions does 
not increase dramatically as we increase the breakup level. This is startling at first since 
the resulting polynomial has an order of ni, x k, with k the polynomial order. However, if 
we think that the level of the approximation is determined by the number of the coefficients 
varied in the minimization procedure this fact is no longer that mysterious. Secondly, we 
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Figure 2: Same as figure |l| but for the Q polynomials. 

would like to point out that for > 1 the accuracy of the P^^''^ and Q^'^''^ polynomials 
with ~ / is approximately the same. Thus, there is really no point in using polynomials 
with k and / vastly different since the level of the accuracy of the final result is going to be 
dominated by the smaller order polynomial. From figures |l], |], it is clear that in order to 
achieve the needed approximation level we have to use polynomials with k, I ~ 100 — 200. 
The necessary order increases with decreasing quark mass. 

Due to the errors introduced by the polynomials the stochastic estimator will not be one 
even when we do not change the configuration at all {Q = Q') . This is due to the fact that 
P (x) Q (x) P (x) is not exactly one. This problem can be largely corrected by rewriting the 
estimator as 

The right hand side of the expression above has only second order errors. It is a much 
better approximation of the quantity on the left hand side than the estimator that we get 
by simply replacing Qr with the polynomial approximation. This step is very important 
since the polynomial approximation would otherwise require much larger order polynomials. 

The necessary order for the polynomials P and Q vary with the quark mass but we found 
that in most cases fairly low orders are sufficient. In simulations with parameters listed in 
tables I and |I| the systematical errors from 128-196 order polynomials were at the same 
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Figure 3: Average computational cost (number of M^M multiplies) for different small polynomials: 
filled squares = 4, crosses nh = 8 {am = 0.04, k, I ~ 128 for large polynomials). 

order as numerical round-off errors. However this precision is usually unnecessary. All we 
need to know for the accept-reject step is whether the stochastic determinant estimate is 
smaller or greater than a random number; the precise value is not important. 

We used a two step approach: we first calculate the stochastic determinant using low order 
polynomials and we fall back, i.e recalculate the stochastic estimator with higher precision, 
only if the random number differs from the low order value by less than the estimated error. 
This method reduces cost if the fall-back rate is small, less or around 10%. To determine 
the optimum small polynomials we vary their order to minimize the average computational 
cost. 

We estimate the error the low order polynomials might cause from preliminary short runs. 
A typical short run has a couple of hundreds estimator measurements using the small and 
the exact (large) polynomials. From this we determine the error of the small polynomials 
such that the values predicted with small and large polynomials agree within this error at 
least 99.5% of the time in the production runs. 

We found that the order of the optimum small polynomials is independent of the breakup 
level, as we can see from figure ^, but the necessary order increases as we decrease the mass. 
The optimum low order polynomials we determined to be = 24, / = 25 for quark mass 
am = 0.1, A; = 32, / = 33 for am = 0.06 and = 48, / = 49 for am = 0.04. The fall- 
back rate, i.e. the rate when we had to repeat the stochastic estimator calculation with 
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higher order polynomials varied between 5% and 10%. The outcome of the accept /reject 
step changed even less frequently, about 10% of the time the fall-back occurred. 

The polynomial approximation is a major part of our simulation. Any improvement that 
reduces the necessary polynomial order or the fall-back rate will immediately reduce the 
computer time requirements. At present we are investigating several possibilities along this 
direction. 

IV. THE EFFECTIVENESS OF THE PARTIAL-GLOBAL STOCHASTIC UP- 
DATE 

The effectiveness of the PGSM algorithm depends on both the partial-global heat bath 
update and the stochastic estimator. The autocorrelation time of the simulation is the larger 
of the autocorrelation times of the heat bath and stochastic steps. If only a small part of 
the original links are updated, the fermionic matrix ratio {V')fl{V) is close to unity, the 
standard deviation and the autocorrelation time of the stochastic estimator is small. On 
the other hand the heat bath update is ineffective and has to be repeated many times to 
update the configuration. Inversely, if many links are changed at once, the heat bath update 
is effective but the standard deviation of the stochastic estimator increases, increasing the 
autocorrelation time again. We would like to find the optimal update where the simulation 
is most effective. 

We have measured the integrated autocorrelation time of the plaquette as the function 
of the number of links updated at one heat bath step, ins, both for n/ = 4 and Uf — 2 
flavor simulations. The estimates for Tauto in the following are based on runs that were 
80-100 times longer than Xauto itself (except at the largest Tauto values where the statistics 
is smaller). While that does not provide a very good estimate for the autocorrelation, it is 
sufficient to distinguish the heatbath versus stochastic estimator dominated regions. 

The autocorrelation time of the pure gauge partial-global heat bath update is easy to 
estimate. If at each step we update tnB links of a lattice of volume V with acceptance rate 
Ta, the autocorrelation time Tpg is related to the autocorrelation time of the whole lattice 
pure gauge heat bath update thb as 

W 

Tpg = THB- — -. (20) 

T^HBT^a 
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Figure 4: Autocorrelation of the = 4, am = 0.1 runs. Open squares: rih = ^ ■, filled squares: 
nfe = 8 determinant breakup. The lines correspond to the expected autocorrelation based on the 
heatbath update alone. 
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Figure 5: Acceptance rate of the = 4, am = 0.1 runs. Notation as in figure ^. 

In our simulations at /? = 5.65 on 8^ x 24 volumes thb ~ 10. The acceptance rate of the 
PGSM depends not only on tuB but on the determinant breakup parameter ni, as well. 

The simulation parameters are those listed in tables |I|, |I|. We considered = 4 and 8 
determinant breakup with the nj = 4, am = 0.1 runs. The open squares in figure || show 
the autocorrelation time for n^, = 4, the filled squares for n^, = 8. The dashed line is Tpg of 
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Figure 6: Autocorrelation of the nj = 2 runs. Open squares: Ub = 8, am = 0.1, filled squares: 
Uf, = 12, am = 0.1, crosses: = 12, am = 0.04 runs. For clarity the data points are slightly 
displaced horizontally. The lines corresponds to the expected autocorrelation based on the heatbath 
update alone. Solid line: Ub = 8, am = 0.1, dashed line: nb = 12, am = 0.04. 



Eq. |20| for Ub = 4, the solid line for = 8. For small tnB values both n;, = 4 and 8 follow 
the Tpg curve, indicating that the autocorrelation time is dominated by the heatbath step. 
The rift = 4 data break off around ins = 1000, the = 8 around ins = 2500 signaling 
that the autocorrelation time beyond that is dominated by the stochastic estimator. The 
acceptance rate, shown in figure ||, varies between 35% and 65% but does not indicate 
whether the autocorrelation is heatbath or stochastic estimator dominated. For an effective 
simulation one should choose t^B = 750 with Ub = 4 oi ins = 2000 with Ub = S. Which 
of these is more effective depends on the implementation of the polynomial approximation 
in the determinant breakup. In our simulations updating 2000 links with = 8 is the 
better choice. One should note that at this mass value the rif, = 4 determinant breakup is 
just barely adequate to guarantee the finiteness of the standard deviation of the stochastic 
estimator, one more reason to choose ri;, = 8 in the simulations. 

The Uf = 2 fiavor simulations are considerably more effective. In figure ^ we plot Tauto 
for the am = 0.1 mass with Ub = 8 and Ub = 12 and for the am = 0.04 quark mass 
with Ub = 12 determinant breakup. Note that the data points with ins = 6144 touch the 
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Figure 7: The acceptance rate of the Uf = 2 runs. Notation as in figure |6|. 

maximum number of links allowed on these 8^ x 24, close to 10 fm^ lattices. With both 
quark mass values only this last point shows any deviation from the Tpg heatbath curve. 
The acceptance rate is shown in figure |^. It is interesting to note that the am = 0.1 runs 
with Uf, = S determinant breakup have very similar acceptance rates to the am = 0.04, 
rib = 12 runs indicating how the determinant breakup should increase with decreasing quark 
mass. We found similar correspondence in other quantities, like the standard deviation of 
the stochastic determinant estimate or the average of the action difference AS". 

Measuring the autocorrelation time requires long simulations. One of our goals in this 
paper is to find an alternate quantity that is easy to measure but still indicates the crossover 
from the heatbath to stochastic estimator regions. We have considered the standard devia- 
tion of the stochastic estimator and also the standard deviation of AS* as defined in Eq. [l^. 
We decided to use the latter as AS* has an almost Gaussian distribution and its standard 
deviation is easier to estimate. In figure || we show both the nj = 4 and nj = 2 data for the 
autocorrelation, measured in units of Tpg, as the function of a^s = V < AS'^ > — < AS' >'^. 
As we have already expected from figures || and ||, most data points are consistent with 
Tanto/Tpg = 1, i-G. heatbath dominated. We see a gradual increase only at a as > 1-6. Based 
on figures [^J^ and || we conclude that the PGSM update is heatbath dominated if the ac- 
ceptance rate is around 50% or higher and the standard deviation of the action difference 
a AS is less than about 1.6. 

We conclude this section by translating the autocorrelation time measurements to com- 
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Figure 8: The autocorrelation time in units of r^g of Eq. 20 as the function of the standard deviation 
of the stochastic action, AS". Open and filled circles: = 4, am = 0.1, = 4 and 8 runs. Open 
and filled squares: = 2, am = 0.1, nf, = 8 and 12 runs. Crosses: nj = 2, am = 0.04, = 12 
runs. 

puter time requirements. In figure ^ we plot the computer time measured in fermionic matrix 
ikf^M multiplies to create configurations in the Uf = 2 fiavor simulations that are separated 
by 2rauto update steps. These estimates strongly depend on the polynomial approximation 
but still give a fair indication of the expected computer cost. One should note that in figure 
P we consider the time requirement of the stochastic estimator only. Updating the gauge 
field and evaluating the HYP smeared links can result in a 10-20% overhead on smaller lat- 
tices. In Ref. 1^ it was found that a HMC thin link staggered fermion update with ri/ = 4 
fiavors at similar physical parameter values to our am = 0.1 run requires about 1.2 x 10^ 
M^M multiplies to create independent configurations . A two fiavor run is faster but even 
than it is evident that on these 10 fm"^ lattices it is actually faster to create HYP smeared 
configurations with PGSM than think link ones with a small step-size algorithm. 

The autocorrelation measurements and computer time estimates presented above were 
based on runs on 10 fm^ lattices. The PGSM algorithm scales with the square of the lattice 
volume. To repeat the above measurements on 100 fm"^ lattices would increase the computer 
time by 100. A factor of 10 increase is due to the increased volume, and the other factor 
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Figure 9: Cost of creating independent configurations, measured in fermionic matrix multiplies. 
Notation is the same as in figure open squares: rib = 8, am = 0.1, filled square: Uf, = 12, 
am = 0.1, crosses: ni, = 12, am = 0.04 runs.. 

of 10 is coming from the increased autocorrelation time. Simulation cost at smaller lattice 
spacing but same physical volume increases only linearly with the volume as the number of 
links that can be effectively touched increases accordingly. 

V. CONCLUSION 

In this paper we discussed the practical implementation of the partial-global stochastic 
Metropolis (PGSM) algorithm for HYP link staggered fermions. The PGSM is a two step 
algorithm where a partial set of the original gauge links are updated with a heatbath or 
overrelaxed step and this proposed configuration is accepted or rejected according to a 
stochastic estimate of the fermionic determinant. Even though only a stochastic estimator 
is used in the Metropolis step, the algorithm satisfies detailed balance. 

The effectiveness of the algorithm depends on the matching of the dynamical action and 
the pure gauge action used in the first step of the algorithm, and on the stochastic estimator 
used in the Metropolis accept-reject step. The first condition can be met by separating the 
pure gauge action into a part that depends on the thin gauge links and matches the short 
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and/or long distance properties of the dynamical action, and a part that depends on the 
smeared links and compensates for the renormalization of the pure gauge coupling due to 
the dynamical fermions. Only the former is used to propose the new configuration, the latter 
is included in the accept-reject step. The stochastic estimator can be significantly improved 
by fermionic matrix reduction and determinant breakup. Our Uf = 2 fiavor simulations 
indicate that with these improvements up to 1 fm^ section of the lattice can be updated 
at one time. The update has to be repeated enough times to refresh the whole lattice, 
but beyond that the properties of the heatbath update determine the autocorrelation time. 
Combining the heatbath update with overrelaxation could provide an even more effective 
algorithm that decorrelates the configurations much faster than the traditional small step 
size algorithms. 
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